Model Selection

SigLIP Vision Encoder

# SigLIP Vision Encoder

Vit So400m Patch14 Siglip Gap 448.pali Mix

A vision-language model based on the SigLIP image encoder, utilizing global average pooling, suitable for multimodal tasks.

Vit Large Patch16 Siglip 384.webli

A vision Transformer model based on SigLIP, containing only the image encoder, using original attention pooling, suitable for image feature extraction tasks.

Image Classification

Vit Base Patch16 Siglip 384.webli

Vision Transformer model based on SigLIP, containing only the image encoder part, using original attention pooling mechanism

Image Classification

Vit So400m Patch14 Siglip 224.webli

Vision Transformer model based on SigLIP, containing only the image encoder part, utilizing original attention pooling mechanism

Image Classification

nanoLLaVA-1.5 is a vision-language model with under 1 billion parameters, designed specifically for edge devices—compact yet powerful.

Transformers English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase